Cryptography and Security 45
★ Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models
Large Language Models (LLMs) are increasingly equipped with capabilities of
real-time web search and integrated with protocols like Model Context Protocol
(MCP). This extension could introduce new security vulnerabilities. We present
a systematic investigation of LLM vulnerabilities to hidden adversarial prompts
through malicious font injection in external resources like webpages, where
attackers manipulate code-to-glyph mapping to inject deceptive content which
are invisible to users. We evaluate two critical attack scenarios: (1)
"malicious content relay" and (2) "sensitive data leakage" through MCP-enabled
tools. Our experiments reveal that indirect prompts with injected malicious
font can bypass LLM safety mechanisms through external resources, achieving
varying success rates based on data sensitivity and prompt design. Our research
underscores the urgent need for enhanced security measures in LLM deployments
when processing external content.
★ Backdoor Cleaning without External Guidance in MLLM Fine-tuning
Multimodal Large Language Models (MLLMs) are increasingly deployed in
fine-tuning-as-a-service (FTaaS) settings, where user-submitted datasets adapt
general-purpose models to downstream tasks. This flexibility, however,
introduces serious security risks, as malicious fine-tuning can implant
backdoors into MLLMs with minimal effort. In this paper, we observe that
backdoor triggers systematically disrupt cross-modal processing by causing
abnormal attention concentration on non-semantic regions--a phenomenon we term
attention collapse. Based on this insight, we propose Believe Your Eyes (BYE),
a data filtering framework that leverages attention entropy patterns as
self-supervised signals to identify and filter backdoor samples. BYE operates
via a three-stage pipeline: (1) extracting attention maps using the fine-tuned
model, (2) computing entropy scores and profiling sensitive layers via bimodal
separation, and (3) performing unsupervised clustering to remove suspicious
samples. Unlike prior defenses, BYE equires no clean supervision, auxiliary
labels, or model modifications. Extensive experiments across various datasets,
models, and diverse trigger types validate BYE's effectiveness: it achieves
near-zero attack success rates while maintaining clean-task performance,
offering a robust and generalizable solution against backdoor threats in MLLMs.
★ CAIN: Hijacking LLM-Humans Conversations via a Two-Stage Malicious System Prompt Generation and Refining Framework
Large language models (LLMs) have advanced many applications, but are also
known to be vulnerable to adversarial attacks. In this work, we introduce a
novel security threat: hijacking AI-human conversations by manipulating LLMs'
system prompts to produce malicious answers only to specific targeted questions
(e.g., "Who should I vote for US President?", "Are Covid vaccines safe?"),
while behaving benignly on others. This attack is detrimental as it can enable
malicious actors to exercise large-scale information manipulation by spreading
harmful but benign-looking system prompts online. To demonstrate such an
attack, we develop CAIN, an algorithm that can automatically curate such
harmful system prompts for a specific target question in a black-box setting or
without the need to access the LLM's parameters. Evaluated on both open-source
and commercial LLMs, CAIN demonstrates significant adversarial impact. In
untargeted attacks or forcing LLMs to output incorrect answers, CAIN achieves
up to 40% F1 degradation on targeted questions while preserving high accuracy
on benign inputs. For targeted attacks or forcing LLMs to output specific
harmful answers, CAIN achieves over 70% F1 scores on these targeted responses
with minimal impact on benign questions. Our results highlight the critical
need for enhanced robustness measures to safeguard the integrity and safety of
LLMs in real-world applications. All source code will be publicly available.
★ Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
Unlearning in large language models (LLMs) is intended to remove the
influence of specific data, yet current evaluations rely heavily on token-level
metrics such as accuracy and perplexity. We show that these metrics can be
misleading: models often appear to forget, but their original behavior can be
rapidly restored with minimal fine-tuning, revealing that unlearning may
obscure information rather than erase it. To diagnose this phenomenon, we
introduce a representation-level evaluation framework using PCA-based
similarity and shift, centered kernel alignment, and Fisher information.
Applying this toolkit across six unlearning methods, three domains (text, code,
math), and two open-source LLMs, we uncover a critical distinction between
reversible and irreversible forgetting. In reversible cases, models suffer
token-level collapse yet retain latent features; in irreversible cases, deeper
representational damage occurs. We further provide a theoretical account
linking shallow weight perturbations near output layers to misleading
unlearning signals, and show that reversibility is modulated by task type and
hyperparameters. Our findings reveal a fundamental gap in current evaluation
practices and establish a new diagnostic foundation for trustworthy unlearning
in LLMs. We provide a unified toolkit for analyzing LLM representation changes
under unlearning and relearning:
https://github.com/XiaoyuXU1/Representational_Analysis_Tools.git.
comment: 44 pages
★ CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models
Despite providing superior performance, open-source large language models
(LLMs) are vulnerable to abusive usage. To address this issue, recent works
propose LLM fingerprinting methods to identify the specific source LLMs behind
suspect applications. However, these methods fail to provide stealthy and
robust fingerprint verification. In this paper, we propose a novel LLM
fingerprinting scheme, namely CoTSRF, which utilizes the Chain of Thought (CoT)
as the fingerprint of an LLM. CoTSRF first collects the responses from the
source LLM by querying it with crafted CoT queries. Then, it applies
contrastive learning to train a CoT extractor that extracts the CoT feature
(i.e., fingerprint) from the responses. Finally, CoTSRF conducts fingerprint
verification by comparing the Kullback-Leibler divergence between the CoT
features of the source and suspect LLMs against an empirical threshold. Various
experiments have been conducted to demonstrate the advantage of our proposed
CoTSRF for fingerprinting LLMs, particularly in stealthy and robust fingerprint
verification.
★ When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques
Jailbreak attacks pose a serious threat to large language models (LLMs) by
bypassing built-in safety mechanisms and leading to harmful outputs. Studying
these attacks is crucial for identifying vulnerabilities and improving model
security. This paper presents a systematic survey of jailbreak methods from the
novel perspective of stealth. We find that existing attacks struggle to
simultaneously achieve toxic stealth (concealing toxic content) and linguistic
stealth (maintaining linguistic naturalness). Motivated by this, we propose
StegoAttack, a fully stealthy jailbreak attack that uses steganography to hide
the harmful query within benign, semantically coherent text. The attack then
prompts the LLM to extract the hidden query and respond in an encrypted manner.
This approach effectively hides malicious intent while preserving naturalness,
allowing it to evade both built-in and external safety mechanisms. We evaluate
StegoAttack on four safety-aligned LLMs from major providers, benchmarking
against eight state-of-the-art methods. StegoAttack achieves an average attack
success rate (ASR) of 92.00%, outperforming the strongest baseline by 11.0%.
Its ASR drops by less than 1% even under external detection (e.g., Llama
Guard). Moreover, it attains the optimal comprehensive scores on stealth
detection metrics, demonstrating both high efficacy and exceptional stealth
capabilities. The code is available at
https://anonymous.4open.science/r/StegoAttack-Jail66
★ Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
The significant progress of large language models (LLMs) has led to
remarkable achievements across numerous applications. However, their ability to
generate harmful content has sparked substantial safety concerns. Despite the
implementation of safety alignment techniques during the pre-training phase,
recent research indicates that fine-tuning LLMs on adversarial or even benign
data can inadvertently compromise their safety. In this paper, we re-examine
the fundamental issue of why fine-tuning on non-harmful data still results in
safety degradation. We introduce a safety-aware probing (SAP) optimization
framework designed to mitigate the safety risks of fine-tuning LLMs.
Specifically, SAP incorporates a safety-aware probe into the gradient
propagation process, mitigating the model's risk of safety degradation by
identifying potential pitfalls in gradient directions, thereby enhancing
task-specific performance while successfully preserving model safety. Our
extensive experimental results demonstrate that SAP effectively reduces
harmfulness below the original fine-tuned model and achieves comparable test
loss to standard fine-tuning methods. Our code is available at
https://github.com/ChengcanWu/SAP.
★ Robust LLM Fingerprinting via Domain-Specific Watermarks
As open-source language models (OSMs) grow more capable and are widely shared
and finetuned, ensuring model provenance, i.e., identifying the origin of a
given model instance, has become an increasingly important issue. At the same
time, existing backdoor-based model fingerprinting techniques often fall short
of achieving key requirements of real-world model ownership detection. In this
work, we build on the observation that while current open-source model
watermarks fail to achieve reliable content traceability, they can be
effectively adapted to address the challenge of model provenance. To this end,
we introduce the concept of domain-specific watermarking for model
fingerprinting. Rather than watermarking all generated content, we train the
model to embed watermarks only within specified subdomains (e.g., particular
languages or topics). This targeted approach ensures detection reliability,
while improving watermark durability and quality under a range of real-world
deployment settings. Our evaluations show that domain-specific watermarking
enables model fingerprinting with strong statistical guarantees, controllable
false positive rates, high detection power, and preserved generation quality.
Moreover, we find that our fingerprints are inherently stealthy and naturally
robust to real-world variability across deployment scenarios.
★ BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
Large language models (LLMs) have shown impressive capabilities across a wide
range of applications, but their ever-increasing size and resource demands make
them vulnerable to inference cost attacks, where attackers induce victim LLMs
to generate the longest possible output content. In this paper, we revisit
existing inference cost attacks and reveal that these methods can hardly
produce large-scale malicious effects since they are self-targeting, where
attackers are also the users and therefore have to execute attacks solely
through the inputs, whose generated content will be charged by LLMs and can
only directly influence themselves. Motivated by these findings, this paper
introduces a new type of inference cost attacks (dubbed 'bit-flip inference
cost attack') that target the victim model itself rather than its inputs.
Specifically, we design a simple yet effective method (dubbed 'BitHydra') to
effectively flip critical bits of model parameters. This process is guided by a
loss function designed to suppress token's probability with an efficient
critical bit search algorithm, thus explicitly defining the attack objective
and enabling effective optimization. We evaluate our method on 11 LLMs ranging
from 1.5B to 14B parameters under both int8 and float16 settings. Experimental
results demonstrate that with just 4 search samples and as few as 3 bit flips,
BitHydra can force 100% of test prompts to reach the maximum generation length
(e.g., 2048 tokens) on representative LLMs such as LLaMA3, highlighting its
efficiency, scalability, and strong transferability across unseen inputs.
★ Unsupervised Network Anomaly Detection with Autoencoders and Traffic Images
Due to the recent increase in the number of connected devices, the need to
promptly detect security issues is emerging. Moreover, the high number of
communication flows creates the necessity of processing huge amounts of data.
Furthermore, the connected devices are heterogeneous in nature, having
different computational capacities. For this reason, in this work we propose an
image-based representation of network traffic which allows to realize a compact
summary of the current network conditions with 1-second time windows. The
proposed representation highlights the presence of anomalies thus reducing the
need for complex processing architectures. Finally, we present an unsupervised
learning approach which effectively detects the presence of anomalies. The code
and the dataset are available at
https://github.com/michaelneri/image-based-network-traffic-anomaly-detection.
comment: Accepted for publication in EUSIPCO 2025
★ BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Vision-Language-Action (VLA) models have advanced robotic control by enabling
end-to-end decision-making directly from multimodal inputs. However, their
tightly coupled architectures expose novel security vulnerabilities. Unlike
traditional adversarial perturbations, backdoor attacks represent a stealthier,
persistent, and practically significant threat-particularly under the emerging
Training-as-a-Service paradigm-but remain largely unexplored in the context of
VLA models. To address this gap, we propose BadVLA, a backdoor attack method
based on Objective-Decoupled Optimization, which for the first time exposes the
backdoor vulnerabilities of VLA models. Specifically, it consists of a
two-stage process: (1) explicit feature-space separation to isolate trigger
representations from benign inputs, and (2) conditional control deviations that
activate only in the presence of the trigger, while preserving clean-task
performance. Empirical results on multiple VLA benchmarks demonstrate that
BadVLA consistently achieves near-100% attack success rates with minimal impact
on clean task accuracy. Further analyses confirm its robustness against common
input perturbations, task transfers, and model fine-tuning, underscoring
critical security vulnerabilities in current VLA deployments. Our work offers
the first systematic investigation of backdoor vulnerabilities in VLA models,
highlighting an urgent need for secure and trustworthy embodied model design
practices. We have released the project page at
https://badvla-project.github.io/.
comment: 19 pages, 12 figures, 6 tables
★ Energy Consumption Framework and Analysis of Post-Quantum Key-Generation on Embedded Devices
The emergence of quantum computing and Shor's algorithm necessitates an
imminent shift from current public key cryptography techniques to post-quantum
robust techniques. NIST has responded by standardising Post-Quantum
Cryptography (PQC) algorithms, with ML-KEM (FIPS-203) slated to replace ECDH
(Elliptic Curve Diffie-Hellman) for key exchange. A key practical concern for
PQC adoption is energy consumption. This paper introduces a new framework for
measuring the PQC energy consumption on a Raspberry Pi when performing key
generation. The framework uses both available traditional methods and the newly
standardised ML-KEM algorithm via the commonly utilised OpenSSL library.
★ Finetuning-Activated Backdoors in LLMs
Finetuning openly accessible Large Language Models (LLMs) has become standard
practice for achieving task-specific performance improvements. Until now,
finetuning has been regarded as a controlled and secure process in which
training on benign datasets led to predictable behaviors. In this paper, we
demonstrate for the first time that an adversary can create poisoned LLMs that
initially appear benign but exhibit malicious behaviors once finetuned by
downstream users. To this end, our proposed attack, FAB (Finetuning-Activated
Backdoor), poisons an LLM via meta-learning techniques to simulate downstream
finetuning, explicitly optimizing for the emergence of malicious behaviors in
the finetuned models. At the same time, the poisoned LLM is regularized to
retain general capabilities and to exhibit no malicious behaviors prior to
finetuning. As a result, when users finetune the seemingly benign model on
their own datasets, they unknowingly trigger its hidden backdoor behavior. We
demonstrate the effectiveness of FAB across multiple LLMs and three target
behaviors: unsolicited advertising, refusal, and jailbreakability.
Additionally, we show that FAB-backdoors are robust to various finetuning
choices made by the user (e.g., dataset, number of steps, scheduler). Our
findings challenge prevailing assumptions about the security of finetuning,
revealing yet another critical attack vector exploiting the complexities of
LLMs.
★ CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
Fine-tuning-as-a-service, while commercially successful for Large Language
Model (LLM) providers, exposes models to harmful fine-tuning attacks. As a
widely explored defense paradigm against such attacks, unlearning attempts to
remove malicious knowledge from LLMs, thereby essentially preventing them from
being used to perform malicious tasks. However, we highlight a critical flaw:
the powerful general adaptability of LLMs allows them to easily bypass
selective unlearning by rapidly relearning or repurposing their capabilities
for harmful tasks. To address this fundamental limitation, we propose a
paradigm shift: instead of selective removal, we advocate for inducing model
collapse--effectively forcing the model to "unlearn everything"--specifically
in response to updates characteristic of malicious adaptation. This collapse
directly neutralizes the very general capabilities that attackers exploit,
tackling the core issue unaddressed by selective unlearning. We introduce the
Collapse Trap (CTRAP) as a practical mechanism to implement this concept
conditionally. Embedded during alignment, CTRAP pre-configures the model's
reaction to subsequent fine-tuning dynamics. If updates during fine-tuning
constitute a persistent attempt to reverse safety alignment, the pre-configured
trap triggers a progressive degradation of the model's core language modeling
abilities, ultimately rendering it inert and useless for the attacker.
Crucially, this collapse mechanism remains dormant during benign fine-tuning,
ensuring the model's utility and general capabilities are preserved for
legitimate users. Extensive empirical results demonstrate that CTRAP
effectively counters harmful fine-tuning risks across various LLMs and attack
settings, while maintaining high performance in benign scenarios. Our code is
available at https://anonymous.4open.science/r/CTRAP.
★ DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection
Large language models (LLMs) are considered valuable Intellectual Properties
(IP) for legitimate owners due to the enormous computational cost of training.
It is crucial to protect the IP of LLMs from malicious stealing or unauthorized
deployment. Despite existing efforts in watermarking and fingerprinting LLMs,
these methods either impact the text generation process or are limited in
white-box access to the suspect model, making them impractical. Hence, we
propose DuFFin, a novel $\textbf{Du}$al-Level $\textbf{Fin}$gerprinting
$\textbf{F}$ramework for black-box setting ownership verification. DuFFin
extracts the trigger pattern and the knowledge-level fingerprints to identify
the source of a suspect model. We conduct experiments on a variety of models
collected from the open-source website, including four popular base models as
protected LLMs and their fine-tuning, quantization, and safety alignment
versions, which are released by large companies, start-ups, and individual
users. Results show that our method can accurately verify the copyright of the
base protected LLM on their model variants, achieving the IP-ROC metric greater
than 0.95. Our code is available at
https://github.com/yuliangyan0807/llm-fingerprint.
★ Language-based Security and Time-inserting Supervisor
Algebraic methods are employed in order to define language-based security
properties of processes. A supervisor is introduced that can disable unwanted
behavior of an insecure process by controlling some of its actions or by
inserting timed actions to make an insecure process secure. We assume a
situation where neither the supervisor nor the attacker has complete
information about the ongoing systems behavior. We study the conditions under
which such a supervisor exists, as well as its properties and limitations.
★ Password Strength Detection via Machine Learning: Analysis, Modeling, and Evaluation
As network security issues continue gaining prominence, password security has
become crucial in safeguarding personal information and network systems. This
study first introduces various methods for system password cracking, outlines
password defense strategies, and discusses the application of machine learning
in the realm of password security. Subsequently, we conduct a detailed public
password database analysis, uncovering standard features and patterns among
passwords. We extract multiple characteristics of passwords, including length,
the number of digits, the number of uppercase and lowercase letters, and the
number of special characters. We then experiment with six different machine
learning algorithms: support vector machines, logistic regression, neural
networks, decision trees, random forests, and stacked models, evaluating each
model's performance based on various metrics, including accuracy, recall, and
F1 score through model validation and hyperparameter tuning. The evaluation
results on the test set indicate that decision trees and stacked models excel
in accuracy, recall, and F1 score, making them a practical option for the
strong and weak password classification task.
comment: 22 pages, 2 figures
★ Consistent and Compatible Modelling of Cyber Intrusions and Incident Response Demonstrated in the Context of Malware Attacks on Critical Infrastructure
Cyber Security Incident Response (IR) Playbooks are used to capture the steps
required to recover from a cyber intrusion. Individual IR playbooks should
focus on a specific type of incident and be aligned with the architecture of a
system under attack. Intrusion modelling focuses on a specific potential cyber
intrusion and is used to identify where and what countermeasures are needed,
and the resulting intrusion models are expected to be used in effective IR,
ideally by feeding IR Playbooks designs. IR playbooks and intrusion models,
however, are created in isolation and at varying stages of the system's
lifecycle. We take nine critical national infrastructure intrusion models -
expressed using Sequential AND Attack Trees - and transform them into models of
the same format as IR playbooks. We use Security Modelling Framework for
modelling attacks and playbooks, and for demonstrating the feasibility of the
better integration between risk assessment and IR at the modelling level. This
results in improved intrusion models and tighter coupling between IR playbooks
and threat modelling which - as we demonstrate - yields novel insights into the
analysis of attacks and response actions. The main contributions of this paper
are (a) a novel way of representing attack trees using the Security Modelling
Framework,(b) a new tool for converting Sequential AND attack trees into models
compatible with playbooks, and (c) the examples of nine intrusion models
represented using the Security Modelling Framework.
★ Privacy-Aware Cyberterrorism Network Analysis using Graph Neural Networks and Federated Learning
Cyberterrorism poses a formidable threat to digital infrastructures, with
increasing reliance on encrypted, decentralized platforms that obscure threat
actor activity. To address the challenge of analyzing such adversarial networks
while preserving the privacy of distributed intelligence data, we propose a
Privacy-Aware Federated Graph Neural Network (PA-FGNN) framework. PA-FGNN
integrates graph attention networks, differential privacy, and homomorphic
encryption into a robust federated learning pipeline tailored for
cyberterrorism network analysis. Each client trains locally on sensitive graph
data and exchanges encrypted, noise-perturbed model updates with a central
aggregator, which performs secure aggregation and broadcasts global updates. We
implement anomaly detection for flagging high-risk nodes and incorporate
defenses against gradient poisoning. Experimental evaluations on simulated dark
web and cyber-intelligence graphs demonstrate that PA-FGNN achieves over 91\%
classification accuracy, maintains resilience under 20\% adversarial client
behavior, and incurs less than 18\% communication overhead. Our results
highlight that privacy-preserving GNNs can support large-scale cyber threat
detection without compromising on utility, privacy, or robustness.
★ ReCopilot: Reverse Engineering Copilot in Binary Analysis
Binary analysis plays a pivotal role in security domains such as malware
detection and vulnerability discovery, yet it remains labor-intensive and
heavily reliant on expert knowledge. General-purpose large language models
(LLMs) perform well in programming analysis on source code, while
binaryspecific LLMs are underexplored. In this work, we present ReCopilot, an
expert LLM designed for binary analysis tasks. ReCopilot integrates binary code
knowledge through a meticulously constructed dataset, encompassing continue
pretraining (CPT), supervised fine-tuning (SFT), and direct preference
optimization (DPO) stages. It leverages variable data flow and call graph to
enhance context awareness and employs test-time scaling to improve reasoning
capabilities. Evaluations on a comprehensive binary analysis benchmark
demonstrate that ReCopilot achieves state-of-the-art performance in tasks such
as function name recovery and variable type inference on the decompiled pseudo
code, outperforming both existing tools and LLMs by 13%. Our findings highlight
the effectiveness of domain-specific training and context enhancement, while
also revealing challenges in building super long chain-of-thought. ReCopilot
represents a significant step toward automating binary analysis with
interpretable and scalable AI assistance in this domain.
★ SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models
As vision-based machine learning models are increasingly integrated into
autonomous and cyber-physical systems, concerns about (physical) adversarial
patch attacks are growing. While state-of-the-art defenses can achieve
certified robustness with minimal impact on utility against highly-concentrated
localized patch attacks, they fall short in two important areas: (i)
State-of-the-art methods are vulnerable to low-noise distributed patches where
perturbations are subtly dispersed to evade detection or masking, as shown
recently by the DorPatch attack; (ii) Achieving high robustness with
state-of-the-art methods is extremely time and resource-consuming, rendering
them impractical for latency-sensitive applications in many cyber-physical
systems.
To address both robustness and latency issues, this paper proposes a new
defense strategy for adversarial patch attacks called SuperPure. The key
novelty is developing a pixel-wise masking scheme that is robust against both
distributed and localized patches. The masking involves leveraging a GAN-based
super-resolution scheme to gradually purify the image from adversarial patches.
Our extensive evaluations using ImageNet and two standard classifiers, ResNet
and EfficientNet, show that SuperPure advances the state-of-the-art in three
major directions: (i) it improves the robustness against conventional localized
patches by more than 20%, on average, while also improving top-1 clean accuracy
by almost 10%; (ii) It achieves 58% robustness against distributed patch
attacks (as opposed to 0% in state-of-the-art method, PatchCleanser); (iii) It
decreases the defense end-to-end latency by over 98% compared to PatchCleanser.
Our further analysis shows that SuperPure is robust against white-box attacks
and different patch sizes. Our code is open-source.
★ Poster: Towards an Automated Security Testing Framework for Industrial UEs EuroS&P 2025
With the ongoing adoption of 5G for communication in industrial systems and
critical infrastructure, the security of industrial UEs such as 5G-enabled
industrial robots becomes an increasingly important topic. Most notably, to
meet the stringent security requirements of industrial deployments, industrial
UEs not only have to fully comply with the 5G specifications but also implement
and use correctly secure communication protocols such as TLS. To ensure the
security of industrial UEs, operators of industrial 5G networks rely on
security testing before deploying new devices to their production networks.
However, currently only isolated tests for individual security aspects of
industrial UEs exist, severely hindering comprehensive testing. In this paper,
we report on our ongoing efforts to alleviate this situation by creating an
automated security testing framework for industrial UEs to comprehensively
evaluate their security posture before deployment. With this framework, we aim
to provide stakeholders with a fully automated-method to verify that
higher-layer security protocols are correctly implemented, while simultaneously
ensuring that the UE's protocol stack adheres to 3GPP specifications.
comment: EuroS&P 2025
★ All You Need is "Leet": Evading Hate-speech Detection AI
Social media and online forums are increasingly becoming popular.
Unfortunately, these platforms are being used for spreading hate speech. In
this paper, we design black-box techniques to protect users from hate-speech on
online platforms by generating perturbations that can fool state of the art
deep learning based hate speech detection models thereby decreasing their
efficiency. We also ensure a minimal change in the original meaning of
hate-speech. Our best perturbation attack is successfully able to evade
hate-speech detection for 86.8 % of hateful text.
comment: 10 pages, 22 figures, The source code and data used in this work is
available at: https://github.com/SampannaKahu/all_you_need_is_leet
★ Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models
The widespread adoption of encrypted communication protocols such as HTTPS
and TLS has enhanced data privacy but also rendered traditional anomaly
detection techniques less effective, as they often rely on inspecting
unencrypted payloads. This study aims to develop an interpretable machine
learning-based framework for anomaly detection in encrypted network traffic.
This study proposes a model-agnostic framework that integrates multiple machine
learning classifiers, with SHapley Additive exPlanations SHAP to ensure
post-hoc model interpretability. The models are trained and evaluated on three
benchmark encrypted traffic datasets. Performance is assessed using standard
classification metrics, and SHAP is used to explain model predictions by
attributing importance to individual input features. SHAP visualizations
successfully revealed the most influential traffic features contributing to
anomaly predictions, enhancing the transparency and trustworthiness of the
models. Unlike conventional approaches that treat machine learning as a black
box, this work combines robust classification techniques with explainability
through SHAP, offering a novel interpretable anomaly detection system tailored
for encrypted traffic environments. While the framework is generalizable,
real-time deployment and performance under adversarial conditions require
further investigation. Future work may explore adaptive models and real-time
interpretability in operational network environments. This interpretable
anomaly detection framework can be integrated into modern security operations
for encrypted environments, allowing analysts not only to detect anomalies with
high precision but also to understand why a model made a particular decision a
crucial capability in compliance-driven and mission-critical settings.
★ Verifying Differentially Private Median Estimation
Differential Privacy (DP) is a robust privacy guarantee that is widely
employed in private data analysis today, finding broad application in domains
such as statistical query release and machine learning. However, DP achieves
privacy by introducing noise into data or query answers, which malicious actors
could exploit during analysis. To address this concern, we propose the first
verifiable differentially private median estimation scheme based on zk-SNARKs.
Our scheme combines the exponential mechanism and a utility function for median
estimation into an arithmetic circuit, leveraging a scaled version of the
inverse cumulative distribution function (CDF) method for precise sampling from
the distribution derived from the utility function. This approach not only
ensures privacy but also provides a mechanism to verify that the algorithm
achieves DP guarantees without revealing sensitive information in the process.
comment: 22 pages
★ A Scalable Hierarchical Intrusion Detection System for Internet of Vehicles
Due to its nature of dynamic, mobility, and wireless data transfer, the
Internet of Vehicles (IoV) is prone to various cyber threats, ranging from
spoofing and Distributed Denial of Services (DDoS) attacks to malware. To
safeguard the IoV ecosystem from intrusions, malicious activities, policy
violations, intrusion detection systems (IDS) play a critical role by
continuously monitoring and analyzing network traffic to identify and mitigate
potential threats in real-time. However, most existing research has focused on
developing centralized, machine learning-based IDS systems for IoV without
accounting for its inherently distributed nature. Due to intensive computing
requirements, these centralized systems often rely on the cloud to detect cyber
threats, increasing delay of system response. On the other hand, edge nodes
typically lack the necessary resources to train and deploy complex machine
learning algorithms. To address this issue, this paper proposes an effective
hierarchical classification framework tailored for IoV networks. Hierarchical
classification allows classifiers to be trained and tested at different levels,
enabling edge nodes to detect specific types of attacks independently. With
this approach, edge nodes can conduct targeted attack detection while
leveraging cloud nodes for comprehensive threat analysis and support. Given the
resource constraints of edge nodes, we have employed the Boruta feature
selection method to reduce data dimensionality, optimizing processing
efficiency. To evaluate our proposed framework, we utilize the latest IoV
security dataset CIC-IoV2024, achieving promising results that demonstrate the
feasibility and effectiveness of our models in securing IoV networks.
★ VIVID: A Novel Approach to Remediation Prioritization in Static Application Security Testing (SAST)
Static Application Security Testing (SAST) enables organizations to detect
vulnerabilities in code early; however, major SAST platforms do not include
visual aids and present little insight on correlations between tainted data
chains. We propose VIVID - Vulnerability Information Via Data flow - a novel
method to extract and consume SAST insights, which is to graph the
application's vulnerability data flows (VDFs) and carry out graph theory
analysis on the resulting VDF directed graph. Nine metrics were assessed to
evaluate their effectiveness in analyzing the VDF graphs of deliberately
insecure web applications. These metrics include 3 centrality metrics, 2
structural metrics, PageRank, in-degree, out-degree, and cross-clique
connectivity. We present simulations that find that out-degree, betweenness
centrality, in-eigenvector centrality, and cross-clique connectivity were found
to be associated with files exhibiting high vulnerability traffic, making them
refactoring candidates where input sanitization may have been missed.
Meanwhile, out-eigenvector centrality, PageRank, and in-degree were found to be
associated with nodes enabling vulnerability flow and sinks, but not
necessarily where input validation should be placed. This is a novel method to
automatically provide development teams an evidence-based prioritized list of
files to embed security controls into, informed by vulnerability propagation
patterns in the application architecture.
★ SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Large Reasoning Models (LRMs) introduce a new generation paradigm of
explicitly reasoning before answering, leading to remarkable improvements in
complex tasks. However, they pose great safety risks against harmful queries
and adversarial attacks. While recent mainstream safety efforts on LRMs,
supervised fine-tuning (SFT), improve safety performance, we find that
SFT-aligned models struggle to generalize to unseen jailbreak prompts. After
thorough investigation of LRMs' generation, we identify a safety aha moment
that can activate safety reasoning and lead to a safe response. This aha moment
typically appears in the `key sentence', which follows models' query
understanding process and can indicate whether the model will proceed safely.
Based on these insights, we propose SafeKey, including two complementary
objectives to better activate the safety aha moment in the key sentence: (1) a
Dual-Path Safety Head to enhance the safety signal in the model's internal
representations before the key sentence, and (2) a Query-Mask Modeling
objective to improve the models' attention on its query understanding, which
has important safety hints. Experiments across multiple safety benchmarks
demonstrate that our methods significantly improve safety generalization to a
wide range of jailbreak attacks and out-of-distribution harmful prompts,
lowering the average harmfulness rate by 9.6\%, while maintaining general
abilities. Our analysis reveals how SafeKey enhances safety by reshaping
internal attention and improving the quality of hidden representations.
★ Outsourcing SAT-based Verification Computations in Network Security
The emergence of cloud computing gives huge impact on large computations.
Cloud computing platforms offer servers with large computation power to be
available for customers. These servers can be used efficiently to solve
problems that are complex by nature, for example, satisfiability (SAT)
problems. Many practical problems can be converted to SAT, for example, circuit
verification and network configuration analysis. However, outsourcing SAT
instances to the servers may cause data leakage that can jeopardize system's
security. Before
outsourcing the SAT instance, one needs to hide the input information. One
way to preserve privacy and hide information is to randomize the SAT
instance before outsourcing. In this paper, we present multiple novel methods
to randomize SAT instances. We present a novel method to randomize the SAT
instance, a variable randomization method to randomize the solution set, and
methods to randomize Mincost SAT and MAX3SAT instances. Our analysis and
evaluation show the correctness and feasibility of these randomization methods.
The scalability and generality of our methods make it applicable for real world
problems.
★ Extensible Post Quantum Cryptography Based Authentication
Cryptography underpins the security of modern digital infrastructure, from
cloud services to health data. However, many widely deployed systems will
become vulnerable after the advent of scalable quantum computing. Although
quantum-safe cryptographic primitives have been developed, such as
lattice-based digital signature algorithms (DSAs) and key encapsulation
mechanisms (KEMs), their unique structural and performance characteristics make
them unsuitable for existing protocols. In this work, we introduce a
quantum-safe single-shot protocol for machine-to-machine authentication and
authorization that is specifically designed to leverage the strengths of
lattice-based DSAs and KEMs. Operating entirely over insecure channels, this
protocol enables the forward-secure establishment of tokens in constrained
environments. By demonstrating how new quantum-safe cryptographic primitives
can be incorporated into secure systems, this study lays the groundwork for
scalable, resilient, and future-proof identity infrastructures in a
quantum-enabled world.
comment: 20 pages, single spaced, preprint
♻ ★ Adaptive Plan-Execute Framework for Smart Contract Security Auditing
Large Language Models (LLMs) have shown great promise in code analysis and
auditing; however, they still struggle with hallucinations and limited
context-aware reasoning. We introduce SmartAuditFlow, a novel Plan-Execute
framework that enhances smart contract security analysis through dynamic audit
planning and structured execution. Unlike conventional LLM-based auditing
approaches that follow fixed workflows and predefined steps, SmartAuditFlow
dynamically generates and refines audit plans based on the unique
characteristics of each smart contract. It continuously adjusts its auditing
strategy in response to intermediate LLM outputs and newly detected
vulnerabilities, ensuring a more adaptive and precise security assessment. The
framework then executes these plans step by step, applying a structured
reasoning process to enhance vulnerability detection accuracy while minimizing
hallucinations and false positives. To further improve audit precision,
SmartAuditFlow integrates iterative prompt optimization and external knowledge
sources, such as static analysis tools and Retrieval-Augmented Generation
(RAG). This ensures audit decisions are contextually informed and backed by
real-world security knowledge, producing comprehensive security reports.
Extensive evaluations across multiple benchmarks demonstrate that
SmartAuditFlow outperforms existing methods, achieving 100 percent accuracy on
common and critical vulnerabilities, 41.2 percent accuracy for comprehensive
coverage of known smart contract weaknesses in real-world projects, and
successfully identifying all 13 tested CVEs. These results highlight
SmartAuditFlow's scalability, cost-effectiveness, and superior adaptability
over traditional static analysis tools and contemporary LLM-based approaches,
establishing it as a robust solution for automated smart contract auditing.
comment: 30 pages, 5 figures
♻ ★ PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Guobin Shen, Dongcheng Zhao, Linghao Feng, Xiang He, Jihang Wang, Sicheng Shen, Haibo Tong, Yiting Dong, Jindong Li, Xiang Zheng, Yi Zeng
Large language models (LLMs) have achieved remarkable capabilities but remain
vulnerable to adversarial prompts known as jailbreaks, which can bypass safety
alignment and elicit harmful outputs. Despite growing efforts in LLM safety
research, existing evaluations are often fragmented, focused on isolated attack
or defense techniques, and lack systematic, reproducible analysis. In this
work, we introduce PandaGuard, a unified and modular framework that models LLM
jailbreak safety as a multi-agent system comprising attackers, defenders, and
judges. Our framework implements 19 attack methods and 12 defense mechanisms,
along with multiple judgment strategies, all within a flexible plugin
architecture supporting diverse LLM interfaces, multiple interaction modes, and
configuration-driven experimentation that enhances reproducibility and
practical deployment. Built on this framework, we develop PandaBench, a
comprehensive benchmark that evaluates the interactions between these
attack/defense methods across 49 LLMs and various judgment approaches,
requiring over 3 billion tokens to execute. Our extensive evaluation reveals
key insights into model vulnerabilities, defense cost-performance trade-offs,
and judge consistency. We find that no single defense is optimal across all
dimensions and that judge disagreement introduces nontrivial variance in safety
assessments. We release the code, configurations, and evaluation results to
support transparent and reproducible research in LLM safety.
♻ ★ Compile-Time Fully Homomorphic Encryption of Vectors: Eliminating Online Encryption via Algebraic Basis Synthesis
We propose a framework for compile-time ciphertext synthesis in fully
homomorphic encryption (FHE) systems, where ciphertexts are constructed from
precomputed encrypted basis vectors combined with a runtime-scaled encryption
of zero. This design eliminates online encryption and instead relies solely on
ciphertext-level additions and scalar multiplications, enabling efficient data
ingestion and algebraic reuse. We formalize the method as a randomized
$\mathbb{Z}_t$-module morphism and prove that it satisfies IND-CPA security
under standard assumptions. The proof uses a hybrid game reduction, showing
that adversarial advantage in distinguishing synthesized ciphertexts is
negligible if the underlying FHE scheme is IND-CPA secure. Unlike prior designs
that require a pool of random encryptions of zero, our construction achieves
equivalent security using a single zero ciphertext multiplied by a fresh scalar
at runtime, reducing memory overhead while preserving ciphertext randomness.
The resulting primitive supports efficient integration with standard FHE APIs
and maintains compatibility with batching, rotation, and aggregation, making it
well-suited for encrypted databases, streaming pipelines, and secure compiler
backends.
♻ ★ Firewalls to Secure Dynamic LLM Agentic Networks
LLM agents will likely communicate on behalf of users with other
entity-representing agents on tasks involving long-horizon plans with
interdependent goals. Current work neglects these agentic networks and their
challenges. We identify required properties for agent communication:
proactivity, adaptability, privacy (sharing only task-necessary information),
and security (preserving integrity and utility against selfish entities). After
demonstrating communication vulnerabilities, we propose a practical design and
protocol inspired by network security principles. Our framework automatically
derives task-specific rules from prior conversations to build firewalls. These
firewalls construct a closed language that is completely controlled by the
developer. They transform any personal data to the allowed degree of
permissibility entailed by the task. Both operations are completely quarantined
from external attackers, disabling the potential for prompt injections,
jailbreaks, or manipulation. By incorporating rules learned from their previous
mistakes, agents rewrite their instructions and self-correct during
communication. Evaluations on diverse attacks demonstrate our framework
significantly reduces privacy and security vulnerabilities while allowing
adaptability.
♻ ★ Discovering Spoofing Attempts on Language Model Watermarks
LLM watermarks stand out as a promising way to attribute ownership of
LLM-generated text. One threat to watermark credibility comes from spoofing
attacks, where an unauthorized third party forges the watermark, enabling it to
falsely attribute arbitrary texts to a particular LLM. Despite recent work
demonstrating that state-of-the-art schemes are, in fact, vulnerable to
spoofing, no prior work has focused on post-hoc methods to discover spoofing
attempts. In this work, we for the first time propose a reliable statistical
method to distinguish spoofed from genuinely watermarked text, suggesting that
current spoofing attacks are less effective than previously thought. In
particular, we show that regardless of their underlying approach, all current
learning-based spoofing methods consistently leave observable artifacts in
spoofed texts, indicative of watermark forgery. We build upon these findings to
propose rigorous statistical tests that reliably reveal the presence of such
artifacts and thus demonstrate that a watermark has been spoofed. Our
experimental evaluation shows high test power across all learning-based
spoofing methods, providing insights into their fundamental limitations and
suggesting a way to mitigate this threat. We make all our code available at
https://github.com/eth-sri/watermark-spoofing-detection .
♻ ★ Reconciling Privacy and Explainability in High-Stakes: A Systematic Inquiry
Deep learning's preponderance across scientific domains has reshaped
high-stakes decision-making, making it essential to follow rigorous operational
frameworks that include both Right-to-Privacy (RTP) and Right-to-Explanation
(RTE). This paper examines the complexities of combining these two
requirements. For RTP, we focus on `Differential privacy` (DP), which is
considered the current gold standard for privacy-preserving machine learning
due to its strong quantitative guarantee of privacy. For RTE, we focus on
post-hoc explainers: they are the go-to option for model auditing as they
operate independently of model training. We formally investigate DP models and
various commonly-used post-hoc explainers: how to evaluate these explainers
subject to RTP, and analyze the intrinsic interactions between DP models and
these explainers. Furthermore, our work throws light on how RTP and RTE can be
effectively combined in high-stakes applications. Our study concludes by
outlining an industrial software pipeline, with the example of a wildly used
use-case, that respects both RTP and RTE requirements.
comment: Accepted at TMLR
♻ ★ Differential Privacy in Continual Learning: Which Labels to Update?
The goal of continual learning (CL) is to retain knowledge across tasks, but
this conflicts with strict privacy required for sensitive training data that
prevents storing or memorising individual samples. To address that, we combine
CL and differential privacy (DP). We highlight that failing to account for
privacy leakage through the set of labels a model can output can break the
privacy of otherwise valid DP algorithms. This is especially relevant in CL. We
show that mitigating the issue with a data-independent overly large label space
can have minimal negative impact on utility when fine-tuning a pre-trained
model under DP, while learning the labels with a separate DP mechanism risks
losing small classes.
comment: 39 pages, 13 figures
♻ ★ Detection of Aerial Spoofing Attacks to LEO Satellite Systems via Deep Learning
Detecting spoofing attacks to Low-Earth-Orbit (LEO) satellite systems is a
cornerstone to assessing the authenticity of the received information and
guaranteeing robust service delivery in several application domains. The
solutions available today for spoofing detection either rely on additional
communication systems, receivers, and antennas, or require mobile deployments.
Detection systems working at the Physical (PHY) layer of the satellite
communication link also require time-consuming and energy-hungry training
processes on all satellites of the constellation, and rely on the availability
of spoofed data, which are often challenging to collect. Moreover, none of such
contributions investigate the feasibility of aerial spoofing attacks launched
via drones operating at various altitudes. In this paper, we propose a new
spoofing detection technique for LEO satellite constellation systems, applying
anomaly detection on the received PHY signal via autoencoders. We validate our
solution through an extensive measurement campaign involving the deployment of
an actual spoofer (Software-Defined Radio) installed on a drone and injecting
rogue IRIDIUM messages while flying at different altitudes with various
movement patterns. Our results demonstrate that the proposed technique can
reliably detect LEO spoofing attacks launched at different altitudes, while
state-of-the-art competing approaches simply fail. We also release the
collected data as open source, fostering further research on satellite
security.
comment: Accepted for Publication by Elsevier Computer Networks
♻ ★ Model-based Large Language Model Customization as Service
Prominent Large Language Model (LLM) services from providers like OpenAI and
Google excel at general tasks but often underperform on domain-specific
applications. Current customization services for these LLMs typically require
users to upload data for fine-tuning, posing significant privacy risks. While
differentially private (DP) data synthesis presents a potential alternative,
its application commonly results in low effectiveness due to the introduction
of excessive noise on data for DP. To overcome this, we introduce Llamdex, a
novel framework that facilitates LLM customization as a service, where the
client uploads pre-trained domain-specific models rather than data. This
client-uploaded model, optionally protected by DP with much lower noise, is
inserted into the base LLM via connection modules. Significantly, these
connecting modules are trained without requiring sensitive domain data,
enabling clients to customize LLM services while preserving data privacy.
Experiments demonstrate that Llamdex improves domain-specific accuracy by up to
26\% over state-of-the-art private data synthesis methods under identical
privacy constraints and, by obviating the need for users to provide domain
context within queries, maintains inference efficiency comparable to the
original LLM service.
♻ ★ On the Lack of Robustness of Binary Function Similarity Systems
Gianluca Capozzi, Tong Tang, Jie Wan, Ziqi Yang, Daniele Cono D'Elia, Giuseppe Antonio Di Luna, Lorenzo Cavallaro, Leonardo Querzoni
Binary function similarity, which often relies on learning-based algorithms
to identify what functions in a pool are most similar to a given query
function, is a sought-after topic in different communities, including machine
learning, software engineering, and security. Its importance stems from the
impact it has in facilitating several crucial tasks, from reverse engineering
and malware analysis to automated vulnerability detection. Whereas recent work
cast light around performance on this long-studied problem, the research
landscape remains largely lackluster in understanding the resiliency of the
state-of-the-art machine learning models against adversarial attacks. As
security requires to reason about adversaries, in this work we assess the
robustness of such models through a simple yet effective black-box greedy
attack, which modifies the topology and the content of the control flow of the
attacked functions. We demonstrate that this attack is successful in
compromising all the models, achieving average attack success rates of 57.06%
and 95.81% depending on the problem settings (targeted and untargeted attacks).
Our findings are insightful: top performance on clean data does not necessarily
relate to top robustness properties, which explicitly highlights
performance-robustness trade-offs one should consider when deploying such
models, calling for further research.
♻ ★ Network Intrusion Datasets: A Survey, Limitations, and Recommendations
Data-driven cyberthreat detection has become a crucial defense technique in
modern cybersecurity. Network defense, supported by Network Intrusion Detection
Systems (NIDSs), has also increasingly adopted data-driven approaches, leading
to greater reliance on data. Despite the importance of data, its scarcity has
long been recognized as a major obstacle in NIDS research. In response, the
community has published many new datasets recently. However, many of them
remain largely unknown and unanalyzed, leaving researchers uncertain about
their suitability for specific use cases.
In this paper, we aim to address this knowledge gap by performing a
systematic literature review (SLR) of 89 public datasets for NIDS research.
Each dataset is comparatively analyzed across 13 key properties, and its
potential applications are outlined. Beyond the review, we also discuss
domain-specific challenges and common data limitations to facilitate a critical
view on data quality. To aid in data selection, we conduct a dataset popularity
analysis in contemporary state-of-the-art NIDS research. Furthermore, the paper
presents best practices for dataset selection, generation, and usage. By
providing a comprehensive overview of the domain and its data, this work aims
to guide future research toward improving data quality and the robustness of
NIDS solutions.
comment: 42 pages, 8 figures, 6 tables. Accepted version for the journal
Computers & Security
♻ ★ Anonymity Unveiled: A Practical Framework for Auditing Data Use in Deep Learning Models CCS'25
The rise of deep learning (DL) has led to a surging demand for training data,
which incentivizes the creators of DL models to trawl through the Internet for
training materials. Meanwhile, users often have limited control over whether
their data (e.g., facial images) are used to train DL models without their
consent, which has engendered pressing concerns.
This work proposes MembershipTracker, a practical data auditing tool that can
empower ordinary users to reliably detect the unauthorized use of their data in
training DL models. We view data auditing through the lens of membership
inference (MI). MembershipTracker consists of a lightweight data marking
component to mark the target data with small and targeted changes, which can be
strongly memorized by the model trained on them; and a specialized MI-based
verification process to audit whether the model exhibits strong memorization on
the target samples.
MembershipTracker only requires the users to mark a small fraction of data
(0.005% to 0.1% in proportion to the training set), and it enables the users to
reliably detect the unauthorized use of their data (average 0% FPR@100% TPR).
We show that MembershipTracker is highly effective across various settings,
including industry-scale training on the full-size ImageNet-1k dataset. We
finally evaluate MembershipTracker under multiple classes of countermeasures.
comment: A shorter version of this paper will appear in CCS'25
♻ ★ Timestamp Manipulation: Timestamp-based Nakamoto-style Blockchains are Vulnerable
Nakamoto consensus are the most widely adopted decentralized consensus
mechanism in cryptocurrency systems. Since it was proposed in 2008, many
studies have focused on analyzing its security. Most of them focus on
maximizing the profit of the adversary. Examples include the selfish mining
attack [FC '14] and the recent riskless uncle maker (RUM) attack [CCS '23]. In
this work, we introduce the Staircase-Unrestricted Uncle Maker (SUUM), the
first block withholding attack targeting the timestamp-based Nakamoto-style
blockchain. Through block withholding, timestamp manipulation, and difficulty
risk control, SUUM adversaries are capable of launching persistent attacks with
zero cost and minimal difficulty risk characteristics, indefinitely exploiting
rewards from honest participants. This creates a self-reinforcing cycle that
threatens the security of blockchains. We conduct a comprehensive and
systematic evaluation of SUUM, including the attack conditions, its impact on
blockchains, and the difficulty risks. Finally, we further discuss four
feasible mitigation measures against SUUM.
comment: 26 pages, 6 figures
♻ ★ An End-to-End Model For Logits Based Large Language Models Watermarking
The rise of LLMs has increased concerns over source tracing and copyright
protection for AIGC, highlighting the need for advanced detection technologies.
Passive detection methods usually face high false positives, while active
watermarking techniques using logits or sampling manipulation offer more
effective protection. Existing LLM watermarking methods, though effective on
unaltered content, suffer significant performance drops when the text is
modified and could introduce biases that degrade LLM performance in downstream
tasks. These methods fail to achieve an optimal tradeoff between text quality
and robustness, particularly due to the lack of end-to-end optimization of the
encoder and decoder. In this paper, we introduce a novel end-to-end logits
perturbation method for watermarking LLM-generated text. By jointly
optimization, our approach achieves a better balance between quality and
robustness. To address non-differentiable operations in the end-to-end training
pipeline, we introduce an online prompting technique that leverages the
on-the-fly LLM as a differentiable surrogate. Our method achieves superior
robustness, outperforming distortion-free methods by 37-39% under paraphrasing
and 17.2% on average, while maintaining text quality on par with these
distortion-free methods in terms of text perplexity and downstream tasks. Our
method can be easily generalized to different LLMs. Code is available at
https://github.com/KAHIMWONG/E2E_LLM_WM.
♻ ★ DePLOI: Applying NL2SQL to Synthesize and Audit Database Access Control
In every enterprise database, administrators must define an access control
policy that specifies which users have access to which tables. Access control
straddles two worlds: policy (organization-level principles that define who
should have access) and process (database-level primitives that actually
implement the policy). Assessing and enforcing process compliance with a policy
is a manual and ad-hoc task. This paper introduces a new access control model
called Intent-Based Access Control for Databases (IBAC-DB). In IBAC-DB, access
control policies are expressed using abstractions that scale to high numbers of
database objects, and are traceable with respect to implementations. This paper
proposes DePLOI (Deployment Policy Linter for Organization Intents), a
LLM-backed system leveraging access control-specific task decompositions to
accurately synthesize and audit access control implementation from IBAC-DB
abstractions. As DePLOI is the first system of its kind to our knowledge, this
paper further proposes IBACBench, the first benchmark for evaluating the
synthesis and auditing capabilities of DePLOI. IBACBench leverages a
combination of current NL2SQL benchmarks, real-world role hierarchies and
access control policies, and LLM-generated data. We find that DePLOI achieves
high synthesis accuracies and auditing F1 scores overall, and greatly
outperforms other LLM prompting strategies (e.g., by 10 F1 points).
comment: 13 pages, 5 figures, 2 tables